Feature Mining for Localised Crowd Counting

نویسندگان

Ke Chen

Chen Change Loy

Shaogang Gong

Tony Xiang

چکیده

Crowd counting in public places has a wide spectrum of applications especially in crowd control, public space design, and pedestrian behaviour profiling. Existing counting by regression methods, which aim to learn a direct mapping between low-level features and people count without segregation or tracking of individuals, can be categorised into either global approaches or local approaches. Global approaches [1, 3, 4] learn a single regression function between image features extracted globally from the entire image space and the total people count in that image space. Since spatial information is lost when computing global features, such a model assumes implicitly that a feature should be weighted the same regardless where in the scene it is extracted. However, this assumption is largely invalid in real-world scenarios. To overcome these limitations of a global approach, local models [5, 7] aim to relax the global assumption to certain extent by dividing the image space into cell regions, each of which modelled by a separate regression function. However, existing local methods suffer a scalability issue due to the need to learn multiple regression models, the number of which can become very large. In addition, an inherent drawback of existing local models is that no information is shared across spatially localised regions in order to provide a more context-aware feature selection for more accurate crowd counting. We consider that localised feature importance mining and information sharing among regions are two key factors for accurate and robust crowd counting, which are missing in all existing techniques. To this end, we propose a single multi-output model for joint localised crowd counting based on ridge regression [6], which takes inter-dependent local features from local spatial regions as input and people count from individual regions as multi-dimensional structured output. Unlike global regression methods, our model relaxes the one-to-one mapping assumption by learning spatially localised regression functions jointly in a single model for all the individual cell regions in a scene, as such our model can capture feature importance locally. Unlike existing approaches to building multiple local regression models, our single model is learned by joint optimisation to enforce dependencies among cell regions. Therefore information from all local spatial regions can be shared to achieve more reliable count prediction. Figure 1 gives an overview of our framework: (Step-1) We first infer a perspective normalisation map using the method described in [2]. (Step-2) Given a set of training images, we extract low-level imagery features, including local foreground, edges and texture features, from each cell region. (Step-3) Local features from each cell are used to construct a local intermediate feature vector before all local intermediate feature vectors are concatenated into a single ordered (location-aware) feature vector. (Step-4) A multi-output regression model based on multivariant ridge regression is trained using the single concatenated feature vector and the vector, each element being actual count in each region, as a training pair. Given a new test frame, features are extracted and mapped to the learned regression model for generating a structured output that estimates the crowd count in each local region simultaneously. For a training video frame i, where i = 1,2 . . .N and N denotes the total number of training frames, we first partition the frame into K cell regions (see Step-3 in Figure 1). We then extract low-level imagery features z j i from each cell region j and combine them into an intermediate feature vector xi ∈ Rd . We also concatenate the localised labelled ground truth u j i from each cell region into a multi-dimensional output vector, yi ∈ Rm, i = 1,2 . . .N

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Crowd Counting via Weighted VLAD on Dense Attribute Feature Maps

Crowd counting is an important task in computer vision, which has many applications in video surveillance. Although the regression-based framework has achieved great improvements for crowd counting, how to improve the discriminative power of image representation is still an open problem. Conventional holistic features used in crowd counting often fail to capture semantic attributes and spatial ...

متن کامل

Perform Three Data Mining Tasks with Crowdsourcing Process

For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...

متن کامل

Crossing-Line Crowd Counting with Two-Phase Deep Neural Networks

In this paper, we propose a deep Convolutional Neural Network (CNN) for counting the number of people across a line-of-interest (LOI) in surveillance videos. It is a challenging problem and has many potential applications. Observing the limitations of temporal slices used by state-of-the-art LOI crowd counting methods, our proposed CNN directly estimates the crowd counts with pairs of video fra...

متن کامل

Scene Invariant Crowd Segmentation and Counting Using Scale-Normalized Histogram of Moving Gradients (HoMG)

The problem of automated crowd segmentation and counting has garnered significant interest in the field of video surveillance. This paper proposes a novel scene invariant crowd segmentation and counting algorithm designed with high accuracy yet low computational complexity in mind, which is key for widespread industrial adoption. A novel low-complexity, scale-normalized feature called Histogram...

متن کامل

Crowd Density and Counting Estimation Based on Image Textural Feature

This paper proposes an image textural analytical method for estimating the crowd density and counting. At first, the target detection is conducted to obtain the foreground image. This crowd image is used to calculate the gray level co-occurrence matrix (GLCM). Then, according to the characteristic values of the gray level co-occurrence matrix, i.e., energy, entropy, contrast, homogeneity, we us...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Feature Mining for Localised Crowd Counting

نویسندگان

چکیده

منابع مشابه

Crowd Counting via Weighted VLAD on Dense Attribute Feature Maps

Perform Three Data Mining Tasks with Crowdsourcing Process

Crossing-Line Crowd Counting with Two-Phase Deep Neural Networks

Scene Invariant Crowd Segmentation and Counting Using Scale-Normalized Histogram of Moving Gradients (HoMG)

Crowd Density and Counting Estimation Based on Image Textural Feature

عنوان ژورنال:

اشتراک گذاری